Pivot-Based Topic Models for Low-Resource Lexicon Extraction

نویسندگان

John Richardson

Toshiaki Nakazawa

Sadao Kurohashi

چکیده

This paper proposes a range of solutions to the challenges of extracting large and highquality bilingual lexicons for low-resource language pairs. In such scenarios there is often no parallel or even comparable data available. We design three effective pivotbased approaches inspired by the state-ofthe-art technique of bilingual topic modelling, extending previous work to take advantage of trilingual data. The proposed models are shown to outperform traditional methods significantly and can be adapted based upon the nature of available training data. We demonstrate the accuracy of these pivot-based approaches in a realistic scenario generating an IcelandicKorean lexicon from Wikipedia.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Pivot, Box and Trilingual: Lexicon Extraction for Low-Resource Language Pairs with Extended Topic Models

Data-driven approaches to natural language processing have been shown to be greatly effective, and the case of bilingual lexicon extraction is no exception. While training data is readily available for many language pairs, many existing approaches fail for languages for which there simply does not exist parallel data. While there have been many studies on bilingual lexicon extraction, there has...

متن کامل

Evaluating a Pivot-Based Approach for Bilingual Lexicon Extraction

A pivot-based approach for bilingual lexicon extraction is based on the similarity of context vectors represented by words in a pivot language like English. In this paper, in order to show validity and usability of the pivot-based approach, we evaluate the approach in company with two different methods for estimating context vectors: one estimates them from two parallel corpora based on word as...

متن کامل

Constraint-Based Bilingual Lexicon Induction for Closely Related Languages

The lack or absence of parallel and comparable corpora makes bilingual lexicon extraction becomes a difficult task for low-resource languages. Pivot language and cognate recognition approach have been proven useful to induce bilingual lexicons for such languages. We analyze the features of closely related languages and define a semantic constraint assumption. Based on the assumption, we propose...

متن کامل

Bilingual Multi-Word Lexicon Construction via a Pivot Language

Bilingual multi-word lexicons are helpful for statistical machine translation systems to improve their performance. In this paper we present a method for constructing such lexicons in a resource-poor language pair such as Korean-French. By using two parallel corpora sharing one pivot language we can easily construct such lexicons without any external language resource like a seed dictionary. Th...

متن کامل

Augmenting Phrase Table by Employing Lexicons for Pivot-based SMT

Pivot language is employed as a way to solve the data sparseness problem in machine translation, especially when the data for a particular language pair does not exist. The combination of source-to-pivot and pivot-to-target translation models can induce a new translation model through the pivot language. However, the errors in two models may compound as noise, and still, the combined model may ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

Pivot-Based Topic Models for Low-Resource Lexicon Extraction

نویسندگان

چکیده

منابع مشابه

Pivot, Box and Trilingual: Lexicon Extraction for Low-Resource Language Pairs with Extended Topic Models

Evaluating a Pivot-Based Approach for Bilingual Lexicon Extraction

Constraint-Based Bilingual Lexicon Induction for Closely Related Languages

Bilingual Multi-Word Lexicon Construction via a Pivot Language

Augmenting Phrase Table by Employing Lexicons for Pivot-based SMT

عنوان ژورنال:

اشتراک گذاری